Traffic measurements and P2P

Q: How would you describe Internet traffic, based on the paper, and how
sure are you?

A1 -- type of traffic: P2P has grown substantially in very few years to
be the dominant type of traffic; it used to be Web traffic. We don't
really know since this paper, but presumably 2nd generation P2P systems,
e.g., Bit Torrent, have taken off. We don't really know for locations
other than UW either.

A2 -- variation in time: There is a clear diurnal cycle (at least at
UW), though it is offset between P2P and WWW. Presumably there is spread
across the world too. But what about at finer timescales, how smooth is
Internet traffic? Not as smooth as you'd suspect, which is the subject
of an interesting diversion.

Self-similarity: we would expect from the CLT that if we add up many
distributions of traffic (Poisson = random is a common model) to get
aggregate demands then they would be quite smooth around an expected
mean rate. But they have bursts over a wide-range of timescales (say
milliseconds to hours). This is self-similarity, discovered around 1993.

A3 -- characteristics of transfers: P2P clearly has much larger
transfers on average, by orders of magnitude. This is a consequence of
the application -- applications drive networks! Both P2P and Web have
highly skewed distributions of transfer lengths. For the Web both
popularity and size of documents is Zipf (or power law) so caching is
not very effective; it looks more effective for P2P.

Zipf: Many natural phenomena (word frequency, city size) have a
distribution where the nth element is weighted as 1/n, or more generally
1/n^k. k~=1 is a Zipf distribution, the mor general form is a power-law.
These show up as a straight line on a log-log plot. They are common in
networking and have implications for the overall system. In particular,
the weight in the (unpopular) tail can dominate the weight of the
popular items. This leads to strange situations, e.g., Web caching isn't
very effective, as a small number of large, unpopular items blow the hit
rate; pinning a few large flows can control the routes of most of the
bytes, even though most flows are short.

A4 -- characteristics on the wire: The skewed transfer sizes lead to a
"mice and elephants" world, where most connections are short, but most
of the bytes are in a few long flows.

Q: Why do any of these characteristics matter?

A: They have implications for the effective design of networks and
content distribution systems. Caching was one (negative) result for the
Web. Other ones are congestion control (how will it work if the average
flow gets sent in less than one RTT?), network design (cable with better
downloads than uploads is less appealing in a P2P world) and content
distribution (how to deal with popular and unpopular objects).